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FEASIBILITY  OF  DEVELOPING  A  COMMON  U.S.  ARMY  HELICOPTER  PILOT 
CANDIDATE  SELECTION  SYSTEM:  ANALYSIS  OF  U.S.  AIR  FORCE  DATA 


EXECUTIVE  SUMMARY 


Research  Requirement: 

Personnel  Decisions  Research  Institute  (PDRI),  with  two  subcontractors  -  Damos 
Aviation  Services  (DAS)  and  the  American  Institutes  for  Research  (AIR) — is  developing  and 
validating  a  selection  system  for  U.S.  Army  helicopter  pilot  candidates.  One  of  the  project’s 
stated  goals  is  the  development  of  a  selection  battery  that  can  be  administered  using  the  Internet. 
The  project  staff  is  reviewing  pilot  selection  systems  currently  used  by  the  U.S.  Air  Force 
(USAF)  and  the  U.S.  Navy  (USN)  to  determine  if  the  existing  pilot  selection  tools  would  be 
relevant  and  useful  for  selecting  Army  aviators. 

This  report  summarizes  analyses  of  the  Air  Force  Officer  Qualifying  Test  (AFOQT)  and 
addresses  the  basic  question,  “Can  a  common  selection  system  be  developed  from  existing  tests 
that  has  sufficient  variance  to  discriminate  among  pilot  candidates  from  both  the  enlisted  and 
officer  populations?”  The  U.S.  Army’s  aviator  candidate  pool,  unlike  the  aviator  candidate  pools 
for  the  USAF  and  USN,  includes  military  enlisted  personnel  and  civilians,  many  of  whom  do  not 
have  a  four- year  college  degree.  As  a  result,  there  is  some  concern  that  existing  tests  such  as  the 
USAF’s  AFOQT  and  the  USN’s  Aviation  Selection  Test  Battery  (ASTB)  may  be  too  difficult  for 
a  substantial  subset  of  Army  aviator  candidates,  and  thus  would  not  produce  a  sufficient  spread 
of  scores  at  important  selection  points. 

Procedure: 

The  normative  sample  for  the  soon-to-be-implemented  AFOQT  Form  S  was  used  for  the 
current  investigation.  The  sample  consisted  of  Basic  Military  Training  (BMT)  enlisted  personnel 
likely  to  apply  for  the  Airman  Education  and  Commissioning  Program,  Air  Force  Reserve 
Officer  Training  Cadets  (AFROTC),  and  Officer  Training  School  (OTS)  cadets.  The  AFOQT 
analyses  evaluated  the  difficulty  of  the  AFOQT  for  a  sample  of  USAF  personnel  that  should  be 
similar  in  education  level  to  the  U.S.  Army  aviator  enlisted,  ROTC,  and  Officer  Candidate 
School  applicant  populations.  The  primary  analyses  compared  score  distributions  of  the  AFOQT 
subtest  and  composite  scores  for  the  different  sample  sources:  BMT,  AFROTC,  and  OTS. 
Secondary  analyses  compared  the  AFOQT  and  Armed  Services  Vocational  Aptitude  Battery 
(ASVAB)  components  for  those  with  available  ASVAB  data. 

Findings: 

As  expected,  the  AFOQT  was  more  difficult  for  the  Air  Force  enlisted  personnel  than  for 
other  commissioning  source  applicants.  However,  the  subtest  and  composite  score  distributions 
are  sufficient  to  discriminate  well  between  enlisted  personnel  if  the  AFOQT  or  a  similar  aptitude 
test  is  used  for  selection.  On  the  highly  timed  subtests  of  the  Pilot  Composite,  such  as  the 
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Instrument  Comprehension  and  Table  Reading  tests,  there  was  almost  no  difference  between  the 
examinee  subpopulations. 

A  common  selection  system  for  all  Army  helicopter  pilot  applicants  appears  to  be 
practical,  but  separate  group  norms  probably  will  be  required  so  that  individual  applicants  are 
rank  ordered  or  eliminated  based  on  the  applicant’s  membership  group.  Direct  conversion  from 
ASVAB  to  AFOQT  results  is  not  recommended  except  as  an  interim  estimate  of  how  the  enlisted 
personnel  are  likely  to  do  on  the  AFOQT,  particularly  the  new  Pilot  Composite. 

Utilization  and  Dissemination  of  Findings: 

This  work  informs  the  decision  process  for  development  of  a  selection  instrument  for 
Army  aviation  and  its  integration  into  the  accession  process.  It  directly  affects  the  determination 
of  suitability  of  existing  instruments  as  part  of  the  objective  test  battery.  The  Selection 
Instrument  for  Flight  Training  project  research  plan,  milestones,  products  and  recommendations 
for  implementation  were  briefed  to  the  Deputy  Commander,  U.S.  Army  Aviation  Warfighting 
Center  on  6  July  2006. 
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FEASIBILITY  OF  DEVELOPING  A  COMMON  US  ARMY  HELICOPTER  PILOT 
CANDIDATE  SELECTION  SYSTEM:  ANALYSIS  OF  U.S.  AIR  FORCE  DATA 


Introduction 

Personnel  Decisions  Research  Institute  (PDRI),  with  two  subcontractors  -  Damos 
Aviation  Services  (DAS)  and  the  American  Institutes  for  Research  (AIR) — is  developing  and 
validating  a  selection  system  for  U.S.  Army  helicopter  pilot  candidates.  One  of  the  project’s 
stated  goals  is  the  development  of  a  selection  battery  that  can  be  administered  using  the  internet. 
The  project  staff  is  reviewing  pilot  selection  systems  currently  used  by  the  U.S.  Air  Force 
(USAF)  and  the  U.S.  Navy  (USN)  to  determine  if  the  existing  pilot  selection  tools  would  be 
relevant  and  useful  for  selecting  Army  aviators. 

This  report  summarizes  analyses  of  the  Air  Force  Officer  Qualifying  Test  (AFOQT)  and 
addresses  the  basic  question,  “Can  a  common  selection  system  be  developed  from  existing  tests 
that  has  sufficient  variance  to  discriminate  among  pilot  candidates  from  both  the  enlisted  and 
officer  populations?”  The  U.S.  Army’s  aviator  candidate  pool,  unlike  the  aviator  candidate  pools 
for  the  USAF  and  USN,  includes  military  enlisted  personnel  and  civilians,  many  of  whom  do  not 
have  a  four- year  college  degree.  As  a  result,  there  is  some  concern  that  existing  tests  such  as  the 
USAF’s  AFOQT  and  the  USN’s  Aviation  Selection  Test  Battery  (ASTB)  may  be  too  difficult  for 
a  substantial  subset  of  Army  aviator  candidates,  and  thus  would  not  produce  a  sufficient  spread 
of  scores  at  important  selection  points. 


Background 

The  AFOQT  has  been  used  to  select  all  Air  Force  commissioning  program  applicants, 
except  Air  Force  Academy  and  Medical  Corps  applicants,  for  the  last  50  years.  The  AFOQT 
also  has  regularly  been  used  to  select  non-academy  candidates  for  undergraduate  pilot  training 
(UPT)  and  undergraduate  navigator  training  (UNT)  since  its  implementation.  Currently,  two 
parallel  versions  of  AFOQT  Form  Q  are  used  operationally.  Since  Form  O  was  implemented  in 
the  early  1980s,  the  AFOQT  has  consisted  of  the  16  subtests  shown  in  Table  1 .  Table  1  also 
shows  five  composites  formed  by  adding  the  indicated  subtest  raw  scores.  The  raw  composite 
scores  are  then  converted  into  percentile  scores  based  on  a  1978  normative  sample.  Forms  O 
through  Q  required  4.5  hours  to  administer  (Gould,  1978;  Skinner  &  Alley,  2002). 

Two  revised  parallel  versions  were  developed  as  Form  R  and  normed  based  on  a  new 
2001  sample.  Implementation  of  Form  R  was  delayed  until  an  effort  to  reduce  the  test  length  and 
test  time  was  completed.  The  reduction  effort  concluded  that  five  subtests  (Reading 
Comprehension,  Data  Interpretation,  Mechanical  Comprehension,  Electrical  Maze,  and  Scale 
Reading)  could  be  removed  without  losing  significant  variance  or  changing  the  factor  structure 
of  the  test  (Operational  Technologies  Corp,  2002).  A  new  1 1 -subtest  version  of  the  AFOQT, 
Form  S,  was  produced  and  placed  in  operation  in  Fall  2005  (Gould  &  Shore,  2003). 
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Table  1 


AFOQT  Forms  O,  P,  and  Q  Subtests  and  Composites 


Subtest 

#  of  Items 

Pilot 

Navigator  / 

Technical 

n 

Academic  | 

Aptitude  ^ 

O 

Verbal 

Quantitative 

Verbal  Analogies  (VA) 

25 

X 

X 

X 

Arithmetic  Reasoning  (AR) 

25 

X 

X 

X 

Reading  Comprehension  (RC) 

25 

X 

X 

Data  Interpretation  (DI) 

25 

X 

X 

X 

Word  Knowledge  (WK) 

25 

X 

X 

Math  Knowledge  (MK) 

25 

X 

X 

X 

Mechanical  Comprehension  (MC) 

20 

X 

X 

Electrical  Maze  (EM) 

20 

X 

X 

Scale  Reading  (SR) 

40 

X 

X 

Instrument  Comprehension  (IC) 

20 

X 

Block  Counting  (BC) 

20 

X 

X 

Table  Reading  (TR) 

40 

X 

X 

Aviation  Information  (AI) 

20 

X 

Rotated  Blocks  (RB) 

15 

X 

General  Science  (GS) 

20 

X 

Hidden  Figures  (HF) 

15 

X 

Total 

380 

An  additional  investigation  resulted  in  changes  to  the  subtest  composition  of  the  Pilot  and 
Navigator/Technical  (NT)  Composites  as  shown  in  Table  2  (Shore  &  Gould,  2003).  Verbal 
Analogies  was  removed  from  the  Pilot  Composite,  and  Arithmetic  Reasoning  and  Math 
Knowledge  were  added.  Rotated  Blocks  and  Hidden  Figures  were  removed  from  the  NT 
composite  and  Verbal  Analogies  was  added.  An  experimental  subtest,  the  Self  Description 
Inventory,  was  added  to  Form  S.  The  complete  test,  with  the  Self  Description  Inventory, 
requires  3.5  hours  to  administer. 

These  changes  had  an  effect  on  the  predictive  validity  of  the  Pilot  and  NT  composites. 
Prediction  of  UPT  attrition  increased  from  r  =  .10  to  .13  and  prediction  of  T-37  training  (the  first 
stage  of  UPT),  from  r  =  .29  to  .35.  UNT  performance  prediction  increased  from  r  =  .33  to  .44. 
These  correlations  are  uncorrected  for  restriction  in  range  or  unreliability.  Their  magnitude  is 
particularly  important  for  UPT  because  only  about  12  %  of  candidates  fail  or  terminate  for  any 
reason  (medical,  self-initiated  elimination,  academic  failure,  flying  deficiency,  manifestations  of 
anxiety,  or  death).  Therefore,  truncation  of  scores  has  a  significant  depressing  effect  on  the 
magnitude  of  the  correlations. 
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Table  2 


AFOQT  Form  S  Subtests  and  Composites 


Subtest 

#  of  Items 

Pilot 

Navigator  / 

Technical 

Academic  O 

O 

Aptitude  3 

O 

C/3 

«— ► 

a 

Verbal 

Quantitative 

Verbal  Analogies 

25 

X 

X 

X 

Arithmetic  Reasoning 

25 

X 

X 

X 

X 

Word  Knowledge 

25 

X 

X 

Math  Knowledge 

25 

X 

X 

X 

X 

Instrument  Comprehension 

20 

X 

Block  Counting 

20 

X 

Table  Reading 

40 

X 

X 

Aviation  Information 

20 

X 

Rotated  Blocks 

15 

General  Science 

20 

X 

Hidden  Figures 

15 

Approach 

Two  sets  of  analyses  were  conducted.  The  first  set  of  analyses  compared  score 
distributions  of  the  AFOQT  subtest  and  composite  scores  for  three  different  sample  sources: 
Basic  Military  Training  (BMT),  Air  Force  Reserve  Officer  Training  Corps  (AFROTC),  and 
Officer  Training  School  (OTS).  In  addition,  gender  and  ethnic  differences  in  subtest  and 
composite  scores  were  analyzed  by  examinee  source  using  basic  linear  regression  techniques. 

For  these  analyses,  all  the  original  16  subtests  included  in  the  AFOQT  Forms  O  through  Q  were 
analyzed  rather  than  restricting  the  analyses  to  the  1 1  subtests  included  in  the  new  AFOQT  Form 
S.  Raw  scores  were  used  in  the  analyses.  The  same  data  set  was  used  in  regression  analyses  to 
evaluate  the  impact  of  gender,  ethnicity,  and  examinee  source  on  test  performance. 

These  analyses  used  the  normative  sample  for  AFOQT  Form  S.  This  sample  of  USAF 
personnel  should  be  similar  in  education  level  to  the  U.S.  Army  aviator  enlisted,  ROTC,  and 
Officer  Candidate  School  applicant  populations.  The  sample  contained  509  enlisted  personnel 
who  took  the  AFOQT  while  they  were  in  BMT  and  scored  at  the  50lh  percentile  or  higher  on  the 
Armed  Services  Vocational  Aptitude  Battery  (ASVAB)  Air  Force  General  (G)  composite.  They 
were  included  in  the  normative  sample  to  represent  the  enlisted  personnel  who  take  the  AFOQT 
while  applying  for  the  Airman  Education  and  Commissioning  program  and  college  students 
applying  for  the  2-  or  4-year  AFROTC  scholarship  programs.  The  sample  also  contained  679 
AFROTC  and  462  OTS  cadets.  The  AFOQT  normative  data  were  collected  in  the  spring  and 
summer  of  2001. 
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The  second  set  of  analyses  investigated  relationships  between  the  AFOQT  and  ASVAB. 
These  relationships  are  important  in  determining  if  ASVAB  test  scores  can  serve  as  selection 
indicators  for  Army  enlisted  personnel  who  apply  as  Army  aviator  candidates.  In  addition,  the 
intercorrelations  of  the  ASVAB  subtest  and  composite  scores  and  the  AFOQT  subtest  and 
composite  raw  scores  were  computed  for  the  enlisted  personnel  to  permit  project  personnel  to 
evaluate  the  common  and  unique  variance  relationships  between  the  ASVAB  and  the  AFOQT. 

For  these  analyses,  the  normative  sample  was  matched  with  ASVAB  files  for  FY  2000 
and  2001  accessions.  The  ASVAB  subtest  and  composite  scores  then  were  extracted  and  added 
to  the  2001  AFOQT  normative  data  file.  The  total  sample  had  406  cases.  All  ASVAB  scores 
used  standard  scores  from  the  1 997  normative  base,  the  only  normative  data  base  currently 
available  for  Air  Force  personnel. 


Results 


AFOOT  Sample  Characteristics 

The  basic  characteristics  of  the  AFOQT  normative  sample  are  shown  in  Table  3.  This 
sample  initially  contained  1 ,650  cases.  After  the  cases  with  missing  background  variables  were 
removed,  the  final  normative  sample  had  1,623  cases.  Females  comprised  20%  of  the  sample. 
The  normative  sample  was  predominately  White  (78%).  Blacks  and  Hispanics  constituted  8% 
and  7%  of  the  normative  sample,  respectively,  with  1%  of  the  sample  composed  of  American 
Indians  and  5%,  of  Asians.  Thirty-one  percent  (3 1  %)  of  the  normative  sample  was  obtained 
from  BMT,  40%  from  AFROTC,  and  28%  from  OTS.  All  of  the  cases  in  the  normative  sample 
had  at  least  a  GED  or  a  high  school  diploma.  The  mean  number  of  years  of  education  was  13.66. 

Table  3  shows  the  raw  score  mean,  range  of  scores,  and  standard  deviation  for  the  five 
AFOQT  composites  and  16  subtests  plus  1 1  demographic  measures.  The  NT  Composite  is 
included  in  this  and  subsequent  tables  and  graphs  because  Army  aviator  training  may  include  a 
significant  amount  of  navigation  training.  In  such  a  case  the  Army  may  wish  to  combine  the  NT 
Composite  with  the  Pilot  Composite.  Mean  years  of  education  for  the  OTS  subsample  was  16.4 
because  only  college  graduates  can  go  to  OTS.  The  mean  years  of  education  for  the  AFROTC 
subsample  was  12.5,  and  for  the  BMT  subsample,  12.7  years.  The  mean  years  of  education  for 
the  AFROTC  cadets  was  low  because  the  AFROTC  detachments  tested  freshmen.  This  fact 
explains  why  the  AFROTC  cadets  score  very  differently  on  certain  subtests  from  the  enlisted 
personnel  in  the  BMT  subsample  despite  the  similarity  of  their  education  levels.  One  individual 
in  BMT  reported  having  doctoral  equivalent  years  of  education.  This  is  not  unusual  for  Air 
Force  basic  enlisted  personnel,  and  some  individuals  with  college  degrees  go  directly  from  basic 
training  to  OTS. 
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Table  3 


AFOQT  Normative  Sample  Statistics 


Statistics 

Test 

n 

Minimum 

Maximum 

Mean 

SD 

Composites 

Verbal 

1623 

11 

74 

45.67 

12.83 

Quantitative 

1623 

7 

75 

46.82 

13.97 

Academic  Aptitude 

1623 

23 

147 

45.67 

24.07 

Pilot 

1623 

38 

195 

122.45 

28.14 

N  avigator/T  echni  cal 

1623 

46 

257 

163.80 

37.78 

Sub  tests 

VA 

1623 

3 

25 

16.01 

4.30 

AR 

1623 

1 

25 

15.39 

5.34 

RC 

1623 

2 

25 

14.43 

4.67 

DI 

1623 

2 

25 

16.74 

4.41 

WK 

1623 

1 

25 

15.23 

5.89 

MK 

1623 

1 

25 

14.69 

5.94 

MC 

1623 

1 

20 

10.39 

3.63 

EM 

1623 

0 

20 

10.00 

3.89 

SR 

1623 

4 

40 

25.02 

7.36 

IC 

1623 

0 

20 

11.54 

5.34 

BC 

1623 

0 

20 

12.37 

4.00 

TR 

1623 

1 

41 

27.35 

7.31 

A1 

1623 

0 

20 

9.77 

4.47 

RB 

1623 

0 

15 

9.50 

3.17 

GS 

1623 

0 

20 

12.40 

3.88 

HF 

1623 

0 

15 

9.96 

3.52 

Demographics 

Years  Education 

1623 

12 

21 

13.66 

2.00 

Male 

1623 

0 

1 

0.79 

0.41 

Female 

1623 

0 

1 

0.20 

0.40 

American  Indian 

1623 

0 

1 

0.01 

0.09 

Asian 

1623 

0 

1 

0.05 

0.21 

Black 

1623 

0 

1 

0.08 

0.28 

Hispanic 

1623 

0 

1 

0.07 

0.25 

White 

1623 

0 

1 

0.79 

0.41 

BMT 

1623 

0 

1 

0.31 

0.46 

ROTC 

1623 

0 

1 

0.40 

0.49 

OTS 

1623 

0 

1 

0.28 

0.45 
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Tables  4,  5,  and  6  show  the  AFOQT  score  distribution  statistics  separately  for  the  three— 
BMT,  AFROTC,  and  OTS-sources.  Many  of  the  test  score  distributions  will  be  shown 
graphically  later  in  the  report.  The  graphs  make  it  easier  to  distinguish  those  composites  and 
subtests  that  show  little  difference  in  test  scores  by  source  from  those  that  show  significant 
source  differences. 


Table  4 

BMT  Subsample  Statistics 


Statistics 

Test 

n 

Minimum 

Maximum 

Mean 

SD 

Composites 

Verbal 

509 

11 

72 

38.51 

12.03 

Quantitative 

509 

7 

70 

38.20 

11.22 

Academic  Aptitude 

509 

28 

137 

76.71 

20.58 

Pilot 

509 

11 

176 

107.00 

23.32 

N  avigator/T  echnical 

509 

56 

235 

142.67 

31.07 

Subtests 

VA 

509 

4 

24 

14.01 

4.16 

AR 

509 

2 

25 

12.81 

4.66 

RC 

509 

3 

24 

12.50 

4.55 

D1 

509 

4 

24 

15.01 

4.06 

WK 

509 

1 

25 

12.00 

5.37 

MK 

509 

1 

25 

10.38 

4.52 

MC 

509 

1 

18 

9.25 

3.32 

EM 

509 

0 

19 

8.63 

3.69 

SR 

509 

4 

39 

22.41 

6.99 

1C 

509 

0 

20 

8.93 

4.73 

BC 

509 

0 

20 

11.24 

3.90 

TR 

509 

2 

41 

25.10 

6.65 

AI 

509 

0 

19 

7.44 

3.08 

RB 

509 

1 

15 

8.90 

2.99 

GS 

509 

1 

19 

10.18 

3.42 

HF 

509 

0 

15 

8.77 

3.49 

Demographic 

Years  Education 

509 

12 

21 

12.65 

1.15 
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Table  5 


AFROTC  Subsample  Statistics 


Statistics 

Test 

n 

Minimum 

Maximum 

Mean 

SD 

Composites 

Verbal 

652 

652 

73 

47.38 

11.70 

Quantitative 

652 

12 

75 

52.47 

11.70 

Academic  Aptitude 

652 

23 

147 

99.85 

21.48 

Pilot 

652 

44 

195 

127.84 

24.83 

Navigator/Technical 

652 

46 

257 

174.88 

32.96 

Sub  tests 

VA 

652 

3 

25 

16.65 

4.06 

AR 

652 

2 

25 

16.71 

5.13 

RC 

652 

2 

25 

15.30 

4.37 

DI 

652 

4 

25 

18.10 

3.84 

WK 

652 

2 

25 

15.43 

5.41 

MK 

652 

3 

25 

17.66 

5.00 

MC 

652 

2 

20 

10.73 

3.47 

EM 

652 

1 

20 

10.44 

3.54 

SR 

652 

8 

40 

26.10 

6.46 

IC 

652 

0 

20 

12.52 

5.21 

BC 

652 

1 

20 

12.56 

3.71 

TR 

652 

8 

41 

28.20 

6.47 

AI 

652 

1 

20 

10.65 

4.40 

RB 

652 

0 

15 

10.10 

3.04 

GS 

652 

1 

20 

13.64 

3.49 

HF 

652 

1 

15 

10.64 

3.11 

Demographic 

Years  Education 

652 

12 

19 

12.53 

1.06 
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Table  6 


OTS  Subsample  Statistics 


Statistics 

Test 

n 

Minimum  Maximum  Mean 

SD 

Composites 

Verbal 

462 

15 

74 

51.16 

11.63 

Quantitative 

462 

10 

75 

48.33 

14.14 

Academic  Aptitude 

462 

462 

146 

99.49 

22.80 

Pilot 

462 

38 

194 

131.87 

30.36 

N  avigator/T  echnical 

462 

46 

254 

171.44 

41.20 

Subtests 


VA 

462 

6 

25 

17.31 

3.99 

AR 

462 

1 

25 

16.37 

5.32 

RC 

462 

2 

25 

15.35 

4.58 

DI 

462 

2 

25 

16.72 

4.86 

WK 

462 

2 

25 

18.49 

5.16 

MK 

462 

2 

25 

15.24 

5.75 

MC 

462 

1 

20 

11.18 

3.85 

EM 

462 

1 

20 

10.88 

4.17 

SR 

462 

4 

39 

26.38 

8.17 

IC 

462 

0 

20 

13.04 

5.11 

BC 

462 

1 

20 

13.33 

4.22 

TR 

462 

1 

41 

28.65 

8.46 

AI 

462 

0 

20 

11.10 

4.88 

RB 

462 

0 

15 

9.32 

3.39 

GS 

462 

0 

20 

13.09 

3.84 

HF 

462 

1 

15 

10.29 

3.79 

Demographic 

Years  Education  462 

12 

21 

16.37 

0.87 

Distributions  of  AFOOT  Composite  and  Subtest  Scores  by  Source 

The  distribution  of  BMT  test  scores  on  the  Pilot  Composite  is  the  key  issue  of  this 
investigation.  The  Pilot  Composite,  however,  was  changed  with  the  implementation  of  AFOQT 
Form  S  in  early  2005.  The  old  Pilot  Composite  used  unit  weightings  for  its  eight  subtests.  The 
new  composite  uses  regression  weights: 

Pilot  Composite  =  1 .2AR  +  1  .OMK  +  1 .9IC  +  1  .OTR  +  2.4A1 

Figure  1  shows  a  distribution  of  the  total  normative  sample  scores  on  the  new  Pilot  Composite 
(P_NW).  The  mean  is  106  and  the  distribution  is  somewhat  platykurtic  but  nearly  normal  with  a 
standard  deviation  of  30.  Figure  2  shows  the  distribution  of  scores  for  the  BMT  subsample  as 
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slightly  skewed  to  the  right  but  suitable  for  making  distinctions  among  members  of  this  group. 
However,  the  mean  is  86  or  2/3’s  of  a  standard  deviation  less  than  that  of  the  total  sample.  The 
BMT  subsample  did  much  worse  than  the  other  groups,  but  the  distinctions  are  sufficient  for 
selection  purposes  if  the  BMT  examinees  are  compared  only  among  themselves. 


Figure  1.  New  Pilot  Composite  for  total  sample. 
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Figure  2.  New  Pilot  composite  for  Basic  Military  Training  (BMT)  sample. 
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The  new  Pilot  Composite  distributions  for  the  ROTC  and  OTS  subsamples  are  shown  in 
Figures  3  and  4.  Their  means  are  similar  and  their  distributions  are  similar  and  slightly  skewed 
to  the  left. 


Figure  3.  New  Pilot  Composite  for  AFROTC  sample. 
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Figure  4.  New  Pilot  Composite  for  OTS  sample. 


Score  distributions  for  the  old  Pilot  Composite  are  similar  to  those  for  the  new  composite 
for  the  total  sample  and  the  BMT  subsample  as  shown  in  Figures  5  and  6.  The  means  are  higher 
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with  no  apparent  skewness,  but  the  enlisted  personnel  again  scored  about  2/3 ’s  of  a  standard 
deviation  lower  than  the  total  sample. 


Figure  5.  AFOQT  Old  Pilot  Composite  for  total  sample. 


Figure  6.  AFOQT  Old  Pilot  Composite  for  BMT  sample. 

Figure  7  superimposes  the  distributions  for  all  three  sample  sources  on  the  graph  for  the 
old  Pilot  Composite.  For  Figure  7  and  all  subsequent  figures,  the  percent  is  based  on  the  total 
sample,  not  on  each  source.  The  AFROTC  mean  is  lower  than  the  OTS  mean,  and  the  BMT 
mean  is  lower  than  the  other  two.  Nevertheless,  the  distributions  are  adequate  to  differentiate 
among  candidates  within  each  source.  The  pattern  is  similar  for  the  NT  Composite  shown  in 
Figure  8. 
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Pilot  Score  X  10 


Figure  7.  Old  Pilot  Composite  for  each  source. 
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Figure  8.  NT  Composite  for  each  source. 

The  distributions  for  the  Verbal  Composite  in  Figure  9  show  that  the  mean  OTS  scores 
are  higher  than  the  AFROTC  scores  and  the  AFROTC  scores  are  much  higher  than  the  BMT 
scores.  Nevertheless,  discrimination  within  sources  is  sufficient  for  use  in  selection.  The  same 
conclusions  may  be  reached  for  the  Quantitative  Composite  in  Figure  10. 
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Figure  9.  Verbal  Composite  for  each  source. 


Figure  10.  Quantitative  Composite  for  each  source. 

Figures  11  to  26  superimpose  the  distributions  for  the  three  sources  on  each  of  the  16 
subtests.  In  general  the  pattern  of  relationships  is  the  same  as  the  pattern  for  the  composites: 

The  source  distribution  shapes  are  very  similar,  suitable  for  use  in  selection,  and  BMT  scores  are 
slightly  lower  than  scores  from  the  other  two  sources.  BMT  scores  are  lowest  on  the  verbal  and 
quantitative  composites,  their  contributing  subtests  (Verbal  Analogies,  Arithmetic  Reasoning, 
Reading  Comprehension,  Word  Knowledge,  Math  Knowledge,  and  Data  Interpretation),  and  the 
General  Science  subtest  but  only  slightly  lower  on  most  of  the  speeded  subtests  (see  Figures  19 
through  22).  The  distributions  for  the  OTS  have  small  ceiling  effects  on  the  Aviation 
Information,  Block  Counting,  and  Table  Reading  subtests.  The  OTS  distribution  shows  a 
moderate  ceiling  effect  on  the  Word  Knowledge,  Rotated  Blocks,  and  Instrument 
Comprehension  subtests.  The  distributions  for  the  AFROTC  show  a  small  ceiling  effect  on  the 
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Word  Knowledge,  Instrument  Comprehension,  and  Rotated  Blocks  subtests.  The  Hidden 
Figures  subtest  (Figure  24)  indicates  a  ceiling  effect  for  the  OTS  and  AFROTC  cadets,  but  this 
subtest  is  no  longer  used  in  any  of  the  composite  scores. 


Figure  1 1 .  Verbal  Analogies  subtest  for  each  source. 


Figure  12.  Arithmetic  Reasoning  subtest  for  each  source. 
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Figure  13.  Reading  Comprehension  for  each  source. 


Figure  14.  Aviation  Information  for  each  source. 
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Figure  15.  Word  Knowledge  for  each  source. 


Figure  16.  Math  Knowledge  for  each  source. 
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Figure  17.  Electrical  Maze  for  each  source. 


Figure  1 8.  Scale  Reading  for  each  source. 
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Figure  19.  Instrument  Comprehension  for  each  source. 


Figure  20.  Block  Counting  for  each  source. 
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Figure  21 .  Table  Reading  for  each  source. 
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Figure  22.  Rotated  Blocks  for  each  source. 
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Figure  23.  General  Science  for  each  source. 


Figure  24.  Hidden  Figures  for  each  source. 
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Figure  25.  Data  Interpretation  for  each  source. 
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Figure  26.  Mechanical  Comprehension  for  each  source. 

Gender.  Ethnicity,  and  Source  Differences  in  AFOOT  New  Pilot  Composite 

To  evaluate  gender,  ethnicity,  and  examinee  source  differences,  dichotomous  (1/0) 
variables  were  generated  and  iteratively  regressed  on  the  new  Pilot  Composite  scores.  For  each 
analysis,  the  variables  of  interest  (e.g.,  gender)  were  removed  from  the  independent  variable  set 
and  years  of  education  and  the  effects  of  the  other  categories  were  held  constant  as  shown  in 
Table  7.  Years  of  education  was  held  constant  to  ensure  that  any  new  Pilot  Composite  score 
differences  identified  were  not  attributable  to  education  level  differences  in  gender,  race,  and 
source  in  the  sample.  The  ethnic  categories  were  American  Indian,  Asian  or  Pacific  Islander, 
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Black,  Hispanic,  and  White.  The  other  dichotomous  variables  were  female,  male,  BMT, 
AFROTC,  and  OTS. 

Table  7 


Regression  Summary 


Model  Title 

R 

R2 

Restriction 

Tested 

dfl 

dfl 

F 

Sig. 

1 

Full 

.603 

.364 

2 

Gender 

.536 

.288 

Model  1  vs  2 

1 

1614 

192.9 

.001 

3 

Ethnicity 

.552 

.304 

Model  1  vs  3 

4 

1619 

38.2 

.001 

4 

Source 

.422 

.178 

Model  1  vs  4 

2 

1616 

236.0 

.001 

Model  1  contains  all  the  predictors:  male,  female,  American  Indian,  Asian,  Black, 
Hispanic,  White,  BMT,  AFROTC,  OTS,  and  years  of  education.  The  criterion  is  the  new  Pilot 
Composite.  Model  2  removes  the  female  and  male  variables.  Model  3  replaces  the  gender 
variables  and  removes  the  ethnicity  variables.  Model  4  replaces  the  ethnicity  variables  and 
removes  the  source  variables. 

As  indicated  in  Table  7,  36.4%  of  the  variance  in  the  new  Pilot  Composite  can  be 
accounted  for  by  the  gender,  ethnicity,  source,  and  years  of  education.  Imposing  the  restriction 
on  Model  1  that  the  gender  variable  weights  are  zero  resulted  in  an  F=  192.9  (1,  1614)  and  the 
restriction  is  not  true,  /?<.001 .  There  are  significant  gender  differences  holding  ethnicity,  source, 
and  years  of  education  constant.  As  shown,  there  are  also  significant  ethnicity  and  source 
differences. 

The  source  effects  are  the  greatest,  and  those  differences  are  most  important  for  the 
purpose  of  the  current  investigation.  After  taking  into  account  gender,  ethnicity,  and  education 
differences,  the  source  differences  are  dramatic,  i.e.  the  R2  for  the  full  model  (/? -. 364)  drops  to 
.  1 78  when  the  source  variables  are  removed  from  the  set  of  predictor  variables.  This  result 
suggests  that  where  there  are  large  differences  between  sources,  as  in  enlisted  versus  officer, 
common  selection  systems  should  use  independent  norms  and  minimum  standards.  When 
AFOQT  Form  N  was  normed,  education  level  standards  and  norms  were  used  to  rank  candidates 
(Gould,  1978).  Even  though  the  same  1978  data  were  used  to  norm  Forms  O  through  Q,  the 
education  level  conversion  tables  were  dropped  and  a  common  table  used  because  of 
administration  issues. 

The  race  and  gender  differences  are  consistent  with  past  studies  of  cognitive 
measurement  differences  in  minority  and  majority  gender  and  race  groups.  In  this  case  the 
difference  persists  even  though  Form  S  was  found  to  minimize  those  differences  compared  to 
Forms  Q  and  R  (Gould  &  Shore,  2003).  Recent  studies  in  group  differences  have  indicated  that 
these  results  may  be  caused  by  statistical  artifacts  when  there  are  large  differences  in  size  of  the 
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minority  and  majority  groups,  as  exists  here;  80  percent  of  the  sample  is  male  and  79  percent  of 
the  sample  is  White  (Chamess  &  Gerchak,  1996). 

The  summary  characteristics  of  all  the  variables  used  in  the  regression  analyses  and  the 
intercorrelations  are  included  in  Appendix  A.  In  addition,  all  the  AFOQT  subtests  and 
composites  characteristics  and  intercorrelations  are  in  Appendix  A  for  those  wishing  to  examine 
the  relationships  between  the  measures  or  to  conduct  additional  regression  analyses. 

ASVAB  and  AFOOT 

The  ASVAB  data  available  for  this  sample  consisted  of  10  subtests:  General  Science 
(GS),  Arithmetic  Reasoning  (AR),  Word  Knowledge  (WK),  Paragraph  Comprehension  (PC), 
Numerical  Operations  (NO),  Coding  Speed  (CS),  Auto  and  Shop  Information  (AS),  Math 
Knowledge  (MK),  Mechanical  Comprehension  (MC),  and  Electrical  Information  (El).  There  is 
an  apparent  eleventh  subtest  called  VE,  but  it  is  simply  a  combination  of  WK  and  PC  and  is  an 
indicator  of  verbal  ability  much  like  the  Verbal  Composite  of  the  AFOQT.  For  making 
classification  decisions  from  the  ASVAB,  the  Air  Force  uses  four  ASVAB  composite  scores: 
General,  Mechanical,  Administrative,  and  Electronic.  Effective  January  2002,  NO  and  CS  were 
removed  from  the  ASVAB  and  Assembling  Objects  (AO)  was  added.  Because  the  available 
sample  entered  the  Air  Force  and  was  tested  on  the  ASVAB  and  AFOQT  in  2001  before  the 
change  occurred,  AO  scores  were  not  available  for  analysis.  The  characteristics  of  the  sample  on 
the  AFOQT  and  ASVAB  subtests  and  composites  and  the  intercorrelations  of  the  composites  and 
subtests  are  contained  in  Appendix  B. 

A  critical  issue  concerns  the  possibility  of  using  the  ASVAB  to  predict  performance  in 
helicopter  pilot  training  either  as  a  substitute  for  a  test  with  a  pilot  composite  or  as  an  interim 
indicator  of  who  might  do  well  on  such  a  test.  Regression  analyses  were  conducted  to  develop 
an  ASVAB  predictor  of  the  AFOQT  new  Pilot  Composite.  Table  8  shows  the  results  of  using 
the  ASVAB  subtest  to  predict  the  new  Pilot  Composite  for  the  406  enlisted  airmen  in  the 
AFOQT  normative  sample.  The  multiple  R  was  .77.  Thus,  at  best  the  ASVAB  can  account  for 
60  percent  of  the  variance  in  the  new  Pilot  Composite  ( R 2  =  .60).  The  standard  error  of  the 
estimate  ( SE)  was  15.54.  Because  the  NO  and  CS  will  not  be  available  for  more  recent  Army 
enlisted  applicants,  the  regression  equations  were  recalculated  without  these  two  subtests.  After 
the  removal  of  the  NO  and  CS  subtests,  the  variance  accounted  for  drops  to  57  percent  ( R 2  = 

.574,  R  =  .757  and  SE  =  16.01).  The  regression  coefficients  are  shown  in  Table  9.  ASVAB 
subtest  scores  may  not  be  readily  available  for  operational  use,  but  predicting  the  new  Pilot 
Composite  from  available  Army  composite  scores  would  lower  the  predictive  efficiency  even 
more  unless  the  new  AO  subtest  contributes  substantial  variance  to  the  predictive  equation. 

Using  the  ASVAB  subtests  to  predict  the  old  Pilot  Composite  only  accounts  for  49.6  percent  of 
the  variance.  Thus  predicting  the  old  composite  is  not  an  option. 
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Table  8 


Regression  Coefficients  Predicting  the  AFOQT  New  Pilot  Composite  from  ASVAB  Subtests 


Regression  Statistics 

Subtest 

Coefficient 

Standard  Error 

Intercept 

-187.01 

15.80 

GS  STD 

0.72 

0.20 

AR  STD 

1.02 

0.21 

WK  STD 

-2.19 

1.49 

PC  STD 

-0.85 

0.69 

NO  STD 

0.23 

0.15 

CS  STD 

0.45 

0.13 

AS  STD 

0.25 

0.14 

MK  STD 

0.79 

0.17 

MC  STD 

0.69 

0.15 

El  STD 

0.36 

0.17 

VE  STD 

3.39 

2.09 

Note:  More  detail  on  the  regression  equation  and  values  are  given  in  Appendix  B.  STD  =  a 


standard  T  score  with  a  mean  of  50  and  a  SD  of  1 0. 


Table  9 


Regression  Coefficients  Predicting  AFOQT  New  Pilot  Composite  from  ASVAB  Subtests 
Without  NO  and  CS 


Subtest 

Regression  Statistics 

Coefficient 

Standard  Error 

Intercept 

-165.547 

15.573 

GS  STD 

0.689 

0.202 

AR  STD 

1.224 

0.210 

WK  STD 

-1.267 

1.529 

PC  STD 

-0.313 

0.710 

AS  STD 

0.160 

0.144 

MK  STD 

0.930 

0.170 

MC  STD 

0.651 

0.152 

El  STD 

0.365 

0.170 

VE  STD 

2.034 

2.146 

Note:  STD  =  a  standard  T  score  with  a  mean  of  50  and  a  SD  of  1 0. 


The  AFOQT  and  ASVAB  subtest  composite  intercorrelations  in  Appendix  B  reveal  that 
the  highest  intercorrelation  is  .79  between  the  ASVAB’s  General  Composite  and  the  Academic 
Aptitude  Composite  (AA)  of  the  AFOQT.  The  largest  surprise  is  that  the  correlation  between  the 
ASVAB’s  Administrative  Composite  (WK+PC+NO+CS)  and  AFOQT’s  Verbal  Composite  is 
only  .39  while  the  ASVAB’s  Word  Knowledge  subtest  alone  correlates  .71  with  the  AFOQT 
Verbal  Composite.  The  ASVAB’s  AFQT  composite  correlates  .83  with  the  AFOQT’s  AA 
composite. 
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Five  subtests  in  the  AFOQT  have  direct  counterparts  in  the  ASVAB:  AR,  WK,  PC 
(called  Reading  Comprehension  in  the  AFOQT),  MK,  and  MC.  Over  the  years,  test  developers 
for  the  AFOQT  gave  the  developers  of  the  ASVAB  some  test  items  that  were  too  easy  or 
difficult  and  vice  versa.  Thus,  the  counterpart  subtests  have  similar  items  as  well  as  similar 
intent.  Given  the  short  interval  between  test  administration  and  the  similarity  in  content  and 
intent  of  the  counterpart  subtests,  the  correlations  from  the  sample  between  the  counterpart 
subtests  were  lower  than  expected:  AR  =  .70,  WK  =  .71,  PC/RC  =  .42,  MK  =  .71  and  MC  =  .55. 


Conclusions 

As  expected,  the  AFOQT  was  more  difficult  for  the  Air  Force  enlisted  personnel  than  for 
other  commissioning  source  applicants.  However,  the  subtest  and  composite  score  distributions 
are  sufficient  to  discriminate  well  between  enlisted  personnel  if  the  AFOQT  or  a  similar  aptitude 
test  is  used  for  selection.  On  the  highly  timed  subtests  of  the  Pilot  Composite,  such  as  the 
Instrument  Comprehension  and  Table  Reading  tests,  there  is  almost  no  difference  between  the 
examinee  subpopulations.  A  common  selection  system  for  all  Army  helicopter  pilot  applicants 
appears  to  be  practical,  but  separate  group  norms  probably  will  be  required  so  that  individual 
applicants  are  rank  ordered  or  eliminated  based  on  the  applicant’s  source  group.  Direct 
conversion  from  ASVAB  to  AFOQT  results  is  not  recommended  except  as  an  interim  estimate  of 
how  the  enlisted  personnel  are  likely  to  do  on  the  AFOQT,  particularly  the  new  Pilot  Composite. 

How  do  the  enlisted  personnel  from  the  normative  sample  compare  to  individuals  who 
are  applicants  for  Army  helicopter  training?  For  comparison  of  the  BMT  enlisted  part  of  the 
normative  sample  to  those  enlisted  personnel  applying  for  Army  aviation,  the  BMT  subsample 
had  a  mean  AFOQT  score  of  74.1  ( SD  =  13.0).  The  ASVAB  scores  will  also  help  gauge  the 
general  intelligence  level  of  the  personnel  in  the  USAF  enlisted  sample  for  comparison  with 
Army  data.  The  results  should  help  the  U.S.  Army  determine  the  appropriate  difficulty  level  for 
the  cognitive  portions  of  its  revised  aviator  selection  battery  and  the  appropriateness  of  the 
AFOQT  for  Army  aviator  selection. 
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APPENDIX  A.  AFOQT  Form  R  and  S  Normative  Sample  Intercorrelations  and  Variable 

Characteristics 
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Table  A1 

Descriptive  Statistics  for  AFOQT  Form  R  and  S  Normative  Sample 
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APPENDIX  B.  AFOQT  2001  Normative  Sample  that  has  ASVAB  Scores  Available:  Variable 

Characteristics  and  Intercorrelations 
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Table  B1 

Descriptive  Statistics  for  AFOQT  2001  Normative  Sample  with  AFOQT 
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Table  B3 

Regression  Summary  for  AFOQT  2001  Normative  Sample  with  AFOQT 
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